Browsing by Author "Kuttel, Michelle Mary"
Now showing 1 - 16 of 16
Results Per Page
Sort Options
- ItemOpen AccessAccelerated Adjoint Algorithmic Differentiation with Applications in Finance(2017) De Beer, Jarred; Ouwehand, Peter; Kuttel, Michelle MaryAdjoint Differentiation's (AD) ability to calculate Greeks efficiently and to machine precision while scaling in constant time to the number of input variables is attractive for calibration and hedging where frequent calculations are required. Algorithmic adjoint differentiation tools automatically generates derivative code and provide interesting challenges in both Computer Science and Mathematics. In this dissertation we focus on a manual implementation with particular emphasis on parallel processing using Graphics Processing Units (GPUs) to accelerate run times. Adjoint differentiation is applied to a Call on Max rainbow option with 3 underlying assets in a Monte Carlo environment. Assets are driven by the Heston stochastic volatility model and implemented using the Milstein discretisation scheme with truncation. The price is calculated along with Deltas and Vegas for each asset, at a total of 6 sensitivities. The application achieves favourable levels of parallelism on all three dimensions implemented by the GPU: Instruction Level Parallelism (ILP), Thread level parallelism (TLP), and Single Instruction Multiple Data (SIMD). We estimate the forward pass of the Milstein discretisation contains an ILP of 3.57 which is between the average range of 2-4. Monte Carlo simulations are embarrassingly parallel and are capable of achieving a high level of concurrency. However, in this context a single kernel running at low occupancy can perform better with a combination of Shared memory, vectorized data structures and a high register count per thread. Run time on the Intel Xeon CPU with 501 760 paths and 360 time steps takes 48.801 seconds. The GT950 Maxwell GPU completed in 0.115 seconds, achieving an 422⇥ speedup and a throughput of 13 million paths per second. The K40 is capable of achieving better performance.
- ItemOpen AccessAccelerating genomic sequence alignment using high performance reconfigurable computers(2008) McMahon, Peter Leonard; Kuttel, Michelle MaryReconfigurable computing technology has progressed to a stage where it is now possible to achieve orders of magnitude performance and power efficiency gains over conventional computer architectures for a subset of high performance computing applications. In this thesis, we investigate the potential of reconfigurable computers to accelerate genomic sequence alignment specifically for genome sequencing applications. We present a highly optimized implementation of a parallel sequence alignment algorithm for the Berkeley Emulation Engine (BEE2) reconfigurable computer, allowing a single BEE2 to align simultaneously hundreds of sequences. For each reconfigurable processor (FPGA), we demonstrate a 61X speedup versus a state-of-the-art implementation on a modern conventional CPU core, and a 56X improvement in performance-per-Watt. We also show that our implementation is highly scalable and we provide performance results from a cluster implementation using 32 FPGAs. We conclude that reconfigurable computers provide an excellent platform on which to run sequence alignment, and that clusters of reconfigurable computers will be able to cope far more easily with the vast quantities of data produced by new ultra-high-throughput sequencers.
- ItemOpen AccessAcceleration of the noise suppression component of the DUCHAMP source-finder(2015) Badenhorst, Scott James; Kuttel, Michelle Mary; Blyth, Sarah-LouiseThe next-generation of radio interferometer arrays - the proposed Square Kilometre Array (SKA) and its precursor instruments, The Karoo Array Telescope (MeerKAT) and Australian Square Kilometre Path finder (ASKAP) - will produce radio observation survey data orders of magnitude larger than current sizes. The sheer size of the imaged data produced necessitates fully automated solutions to accurately locate and produce useful scientific data for radio sources which are (for the most part) partially hidden within inherently noisy radio observations (source extraction). Automated extraction solutions exist but are computationally expensive and do not yet scale to the performance required to process large data in practical time-frames. The DUCHAMP software package is one of the most accurate source extraction packages for general (source shape unknown) source finding. DUCHAMP's accuracy is primarily facilitated by the à trous wavelet reconstruction algorithm, a multi-scale smoothing algorithm which suppresses erratic observation noise. This algorithm is the most computationally expensive and memory intensive within DUCHAMP and consequently improvements to it greatly improve overall DUCHAMP performance. We present a high performance, multithreaded implementation of the à trous algorithm with a focus on 'desktop' computing hardware to enable standard researchers to do their own accelerated searches. Our solution consists of three main areas of improvement: single-core optimisation, multi-core parallelism and the efficient out-of-core computation of large data sets with memory management libraries. Efficient out-of-core computation (data partially stored on disk when primary memory resources are exceeded) of the à trous algorithm accounts for 'desktop' computing's limited fast memory resources by mitigating the performance bottleneck associated with frequent secondary storage access. Although this work focuses on 'desktop' hardware, the majority of the improvements developed are general enough to be used within other high performance computing models. Single-core optimisations improved algorithm accuracy by reducing rounding error and achieved a 4X serial performance increase which scales with the filter size used during reconstruction. Multithreading on a quad-core CPU further increased performance of the filtering operations within reconstruction to 22X (performance scaling approximately linear with increased CPU cores) and achieved 13X performance increase overall. All evaluated out-of-core memory management libraries performed poorly with parallelism. Single-threaded memory management partially mitigated the slow disk access bottleneck and achieved a 3.6X increase (uniform for all tested large data sets) for filtering operations and a 1.5X increase overall. Faster secondary storage solutions such as Solid State Drives or RAID arrays are required to process large survey data on 'desktop' hardware in practical time-frames.
- ItemOpen AccessAdoption of a visual model for temporal database representation(2016) Shunmugam, Tamindran; Keet, Catharina; Kuttel, Michelle MaryToday, in the world of information technology, conceptual model representation of database schemas is challenging for users both in the Software Development Life Cycle (SDLC) and the Human-Computer Interaction (HCI) domain. The primary way to resolve this issue, in both domains, is to use a model that is concise, interpretable and clear to understand, yet encompasses all of the required information to be able to clearly define the database. A temporal database is understood as a database capable of supporting reasoning of time-based data for e.g.: a temporal database can answer questions such as: - for what period was Mrs Jones single before she got married? On the other hand, an atemporal database stores data that is valid today and has no history. In the thesis, I looked at different theoretical temporal visual conceptual models proposed by temporal researchers and aimed, by means of a user-survey consisting of business users, to ascertain towards which models users a preference has. I further asked the users for firstly; whether they prefer textual or graphical representations for the entities, attributes and constraints represented by the visual models, or secondly; whether there is a preference for a specific graphical icon for the temporal entities and lastly; to ascertain if the users show a preference towards a specific theoretical temporal conceptual model. The methodology employed to reach my goal in this thesis, is one of experiments on business users with knowledge enhancements after each experiment. Users were to perform a task, and then based on analysis of the task results, they are taught additional temporal aspects so as improve their knowledge before the next experiment commences. The ultimate aim was to extract a visual conceptual model preference from business users with enhanced knowledge of temporal aspects. This is the first work done in this field and thus will aid researchers in future work, as they will have a temporal conceptual model that promotes effective communication, understandability and interpretability.
- ItemOpen AccessConformation and cross-protection in Group B Streptococcus serotype III and Streptococcus pneumoniae serotype 14: a molecular modeling study.(2019-02-13) Kuttel, Michelle Mary; Ravenscroft, NeilAlthough the branched capsular polysaccharides of Streptococcus agalactiae serotype III (GBSIII PS) and Streptococcus pneumoniae serotype 14 (Pn14 PS) differ only in the addition of a terminal sialic acid on the GBSIII PS side chains, these very similar polysaccharides are immunogenically distinct. Our simulations of GBSIII PS, Pn14 PS and the unbranched backbone polysaccharide provide a conformational rationale for the different antigenic epitopes identified for these PS. We find that side chains stabilize the proximal β dGlc(1→6) β dGlcNAc backbone linkage, restricting rotation and creating a well-defined conformational epitope at the branch point. This agrees with the glycotope structure recognized by an anti-GBSIII PS functional monoclonal antibody. We find the same dominant solution conformation for GBSIII and Pn14 PS: aside from the branch point, the backbone is very flexible with a “zig-zag” conformational habit, rather than the helix previously proposed for GBSIII PS. This suggests a common strategy for bacterial evasion of the host immune system: a flexible backbone that is less perceptible to the immune system, combined with conformationally-defined branch points presenting human-mimic epitopes. This work demonstrates how small structural features such as side chains can alter the conformation of a polysaccharide by restricting rotation around backbone linkages.
- ItemOpen AccessDesigning an effective user interface for the Android tablet environment(2015) Chang, Genevieve; Kuttel, Michelle MaryWith over 1.3 million applications on the Android marketplace, there is increasing competition between mobile applications for customer sales. As usability is a significant factor in an application's success, many mobile developers refer to the Android design guidelines when designing the user interface (UI). These principles help to provide consistency of navigation and aesthetics, with the rest of the Android platform. However, misinterpretation of the abstract guidelines may mean that patterns and elements selected to organise content of an application do not improve the usability. Therefore, usability tests would be beneficial to ensure that an application meets objectives efficiently and improve on user experience. Usability testing is an important and crucial step in the mobile development process Many freelance developers, however, have limited resources for usability testing, even though the advantages of usability feedback during initial development stages are clear and can save time and money in the long-run. In this thesis, we investigate which method of usability testing is most useful for resource constrained mobile developers. To test the efficacy of Android guidelines, three alternate designs of a unique Android tablet application, Glycano, are developed. High-fidelity paper prototypes were presented to end-users for usability testing and to usability experts for heuristic evaluations. Both usability and heuristic tests demonstrated that following the Android guidelines aids in user familiarity and learnability. Regardless of the different UI designs of the three mockups, Android guidelines provided an initial level of usability by providing familiarity to proficient users and an intuitiveness of certain patterns to new users. However, efficiency in building Glycano schematics was an issue that arose consistently. Testing with end-users and experts, revealed several navigational problems. Usability experts uncovered more general UI problems than the end-user group, who focused more on the content of the application. More refinements and suggestions of additional features to enhance usability and user experience were provided by the experts. Use of usability experts would therefore be most advantageous in initial design stages of an application. Feedback from usability testing is, however, also beneficial and is more valuable than not performing any test at all.
- ItemOpen AccessDeveloping analytical tools for saccharides in condensed phases(1999) Kuttel, Michelle Mary; Naidoo, Kevin JCarbohydrates are conformationally very complex molecules. It is this complexity that lies at the basis of the important roles that these molecules play in many biochemical and biomaterial systems. Moreover, the unusual response of these macromolecules to their environment allow them to play these often critical roles. This is particularly true for solvated carbohydrates. A knowledge of the molecular structure of carbohydrates is essential for an understanding of their function and the molecular basis of their macroscopic properties. The details of solution structure have proven difficult to probe experimentally, but computer simulations are a means for examining solvent structure directly. In this thesis we develop various computational methods for analysing saccharides in solution and in the solid state. These methods are applied to molecular dynamics simulations of maltose, hexa-amylose and a series of cyclodextrins in solution, in order to investigate the effects of water on these polysaccharides. Maltose is investigated because of its potential as a model for larger polysaccharides comprising α(1 → 4)-linked glucose monomers. Solvation was found to effect the conformations of the saccharides studied considerably. In particular, the range of motion around the glycosidic linkage is increased. Comparison of the dynamics around the glycosidic linkages for the various simulation show that oligosaccharides linked via α(1 → 4) glycosidic linkages have similar behaviour around this linkage. The saccharides studied were found to impose considerable anisotropic structure on the surrounding water which may give insights into their solution properties. In addition to the studies in solution, a recently developed method for analysing the close contacts in crystal structures is applied to crystal structures of cyclodextrin inclusion compounds. It shown to be a useful tool for investigating hydrogen-bonding patterns in the cyclodextrins.
- ItemOpen AccessGPU-based acceleration of radio interferometry point source visibility simulations in the MeqTrees framework(2013) Baxter, Richard Jonathan; Marais, Patrick; Kuttel, Michelle MaryModern radio interferometer arrays are powerful tools for obtaining high resolution images of low frequency electromagnetic radiation signals in deep space. While single dish radio telescopes convert the electromagnetic radiation directly into an image of the sky (or sky intensity map), interferometers convert the interference patterns between dishes in the array into samples of the Fourier plane (UV-data or visibilities). A subsequent Fourier transform of the visibilities yields the image of the sky. Conversely, a sky intensity map comprising a collection of point sources can be subjected to an inverse Fourier transform to simulate the corresponding Point Source Visibilities (PSV). Such simulated visibilities are important for testing models of external factors that affect the accuracy of observed data, such as radio frequency interference and interaction with the ionosphere. MeqTrees is a widely used radio interferometry calibration and simulation software package that contains a Point Source Visibility module. Unfortunately, calculation of visibilities is computationally intensive: it requires application of the same Fourier equation to many point sources across multiple frequency bands and time slots. There is great potential for this module to be accelerated by the highly parallel Single-Instruction-Multiple-Data (SIMD) architectures in modern commodity Graphics Processing Units (GPU). With many traditional high performance computing techniques requiring high entry and maintenance costs, GPUs have proven to be a cost effective and high performance parallelisation tool for SIMD problems such as PSV simulations. This thesis presents a GPU/CUDA implementation of the Point Source Visibility calculation within the existing MeqTrees framework. For a large number of sources, this implementation achieves an 18x speed-up over the existing CPU module. With modications to the MeqTrees memory management system to reduce overheads by incorporating GPU memory operations, speed-ups of 25x are theoretically achievable. Ignoring all serial overheads, and considering only the parallelisable sections of code, speed-ups reach up to 120x.
- ItemOpen AccessGraphics processing unit accelerated coarse-grained protein-protein docking(2011) Tunbridge, Ian William; Kuttel, Michelle Mary; Gain, JE; Best, RBIn this work, we describe a Graphics processing unit (GPU) implementation of the Kim-Hummer coarse-grained model for protein docking simulations, using a Replica Exchange Monte-Carlo (REMC) method. Our highly parallel implementation vastly increases the size- and time scales accessible to molecular simulation. We describe in detail the complex process of migrating the algorithm to a GPU as well as the effect of various GPU approaches and optimisations on algorithm speed-up.
- ItemOpen AccessLattice Boltzmann liquid simulations on graphics hardware(2014) Clough, Duncan; Gain, James; Kuttel, Michelle MaryFluid simulation is widely used in the visual effects industry. The high level of detail required to produce realistic visual effects requires significant computation. Usually, expensive computer clusters are used in order to reduce the time required. However, general purpose Graphics Processing Unit (GPU) computing has potential as a relatively inexpensive way to reduce these simulation times. In recent years, GPUs have been used to achieve enormous speedups via their massively parallel architectures. Within the field of fluid simulation, the Lattice Boltzmann Method (LBM) stands out as a candidate for GPU execution because its grid-based structure is a natural fit for GPU parallelism. This thesis describes the design and implementation of a GPU-based free-surface LBM fluid simulation. Broadly, our approach is to ensure that the steps that perform most of the work in the LBM (the stream and collide steps) make efficient use of GPU resources. We achieve this by removing complexity from the core stream and collide steps and handling interactions with obstacles and tracking of the fluid interface in separate GPU kernels. To determine the efficiency of our design, we perform separate, detailed analyses of the performance of the kernels associated with the stream and collide steps of the LBM. We demonstrate that these kernels make efficient use of GPU resources and achieve speedups of 29.6_ and 223.7_, respectively. Our analysis of the overall performance of all kernels shows that significant time is spent performing obstacle adjustment and interface movement as a result of limitations associated with GPU memory accesses. Lastly, we compare our GPU LBM implementation with a single-core CPU LBM implementation. Our results show speedups of up to 81.6_ with no significant differences in output from the simulations on both platforms. We conclude that order of magnitude speedups are possible using GPUs to perform free-surface LBM fluid simulations, and that GPUs can, therefore, significantly reduce the cost of performing high-detail fluid simulations for visual effects.
- ItemOpen AccessMolecular modelling of the Streptococcus pneumoniae serogroup 6 capsular polysaccharide antigens(2013) Mathai, Neann Sarah; Kuttel, Michelle Mary; Ravenscroft, NeilIn this thesis, a systematic study of the structural characterization of the capsular polysaccharides of Streptococcus pneumoniae is conducted using Molecular Modelling methods. S.pneumoniae causes invasive pneumococcal disease (IPD), a leading cause of death in children under five. The serotypes in group 6 are amongst the most common of IPD causing serotypes. We performed structural characterization of serogroup 6 to understand the structural relationships between serotypes 6A, 6B, 6C and 6D in an attempt to understand the cross protection seen within the group. The 6B saccharide has been included in the early conjugate vaccine (PCV-7), and has shown to elicit protection against the 6B as well as some cross-protection against 6A. 6A has since been included in the latter conjugate vaccines in the hopes of eliciting stronger protection against 6A and 6C. Molecular Dynamics simulations were used to investigate the conformations of oligosaccharides with the aim of elucidating a conformational rationale for why small changes in the carbohydrate primary structure result in variable efficacy. We began by examining the Potential of Mean Force (PMF) plots of the disaccharide subunits which make up the Serogroup 6 oligosaccharides. The PMFs showed the free energy profiles along the torsional angles space of the disaccharides. This conformational information was then used to build the four oligosaccharides on which simulations were conducted. These simulations showed that serotype pairs 6A/6C and 6B/6D have similar structures.
- ItemOpen AccessParallel fluid dynamics for the film and animation industries(2009) Reid, Ashley; Gain, James; Kuttel, Michelle MaryThe creation of automated fluid effects for film and media using computer simulations is popular, as artist time is reduced and greater realism can be achieved through the use of numerical simulation of physical equations. The fluid effects in today’s films and animations have large scenes with high detail requirements. With these requirements, the time taken by such automated approaches is large. To solve this, cluster environments making use of hundreds or more CPUs have been used. This overcomes the processing power and memory limitations of a single computer and allows very large scenes to be created. One of the newer methods for fluid simulation is the Lattice Boltzmann Method (LBM). This is a cellular automata type of algorithm, which parallelizes easily. An important part of the process of parallelization is load balancing; the distribution of computation amongst the available computing resources in the cluster. To date, the parallelization of the Lattice Boltzmann method only makes use of static load balancing. Instead, it is possible to make use of dynamic load balancing, which adjusts the computation distribution as the simulation progresses. Here, we investigate the use of the LBM in conjunction with a Volume of Fluid (VOF) surface representation in a parallel environment with the aim of producing large scale scenes for the film and animation industries. The VOF method tracks mass exchange between cells of the LBM. In particular, we implement the new dynamic load balancing algorithm to improve the efficiency of the fluid simulation using this method. Fluid scenes from films and animations have two important requirements: the amount of detail and the spatial resolution of the fluid. These aspects of the VOF LBM are explored by considering the time for scene creation using a single and multi-CPU implementation of the method. The scalability of the method is studied by plotting the run time, speedup and efficiency of scene creation against the number of CPUs. From such plots, an estimate is obtained of the feasibility of creating scenes of a giving level of detail. Such estimates enable the recommendation of architectures for creation of specific scenes. Using a parallel implementation of the VOF LBM method we successfully create large scenes with great detail. In general, considering the significant amounts of communication required for the parallel method, it is shown to scale well, favouring scenes with greater detail. The scalability studies show that the new dynamic load balancing algorithm improves the efficiency of the parallel implementation, but only when using lower number of CPUs. In fact, for larger number of CPUs, the dynamic algorithm reduces the efficiency. We hypothesise the latter effect can be removed by making using of centralized load balancing decision instead of the current decentralized approach. The use of a cluster comprising of 200 CPUs is recommended for the production of large scenes of a grid size 6003 in a reasonable time frame.
- ItemOpen AccessA parallel multidimensional weighted histogram analysis method(2014) Potgieter, Andrew; Kuttel, Michelle MaryThe Weighted Histogram Analysis Method (WHAM) is a technique used to calculate free energy from molecular simulation data. WHAM recombines biased distributions of samples from multiple Umbrella Sampling simulations to yield an estimate of the global unbiased distribution. The WHAM algorithm iterates two coupled, non-linear, equations, until convergence at an acceptable level of accuracy. The equations have quadratic time complexity for a single reaction coordinate. However, this increases exponentially with the number of reaction coordinates under investigation, which makes multidimensional WHAM a computationally expensive procedure. There is potential to use general purpose graphics processing units (GPGPU) to accelerate the execution of the algorithm. Here we develop and evaluate a multidimensional GPGPU WHAM implementation to investigate the potential speed-up attained over its CPU counterpart. In addition, to avoid the cost of multiple Molecular Dynamics simulations and for validation of the implementations we develop a test system to generate samples analogous to Umbrella Sampling simulations. We observe a maximum problem size dependent speed-up of approximately 19 x for the GPGPU optimized WHAM implementation over our single threaded CPU optimized version. We find that the WHAM algorithm is amenable to GPU acceleration, which provides the means to study ever more complex molecular systems in reduced time periods.
- ItemOpen AccessRFI monitoring for the MeerKAT Radio Telescope(2015) Schollar, Christopher; Blyth, Sarah-Louise; Kuttel, Michelle Mary; Schroeder, AnjaSouth Africa is currently building MeerKAT, a 64 dish radio telescope array, as a pre-cursor for the proposed Square Kilometre Array (SKA). Both telescopes will be located at a remote site in the Karoo with a low level of Radio Frequency Interference (RFI). It is important to maintain a low level of RFI to ensure that MeerKAT has an unobstructed view of the universe across its bandwidth. The only way to effectively manage the environment is with a record of RFI around the telescope. The RFI management team on the MeerKAT site has multiple tools for monitoring RFI. There is a 7 dish radio telescope array called KAT7 which is used for bi-weekly RFI scans on the horizon. The team has two RFI trailers which provide a mobile spectrum and transient measurement system. They also have commercial handheld spectrum analysers. Most of these tools are only used sporadically during RFI measurement campaigns. None of the tools provided a continuous record of the environment and none of them perform automatic RFI detection. Here we design and implement an automatic, continuous RFI monitoring solution for MeerKAT. The monitor consists of an auxiliary antenna on site which continuously captures and stores radio spectra. The statistics of the spectra describe the radio frequency environment and identify potential RFI sources. All of the stored RFI data is accessible over the web. Users can view the data using interactive visualisations or download the raw data. The monitor thus provides a continuous record of the RF environment, automatically detects RFI and makes this information easily accessible. This RFI monitor functioned successfully for over a year with minimal human intervention. The monitor assisted RFI management on site during RFI campaigns. The data has proved to be accurate, the RFI detection algorithm shown to be effective and the web visualisations have been tested by MeerKAT engineers and astronomers and proven to be useful. The monitor represents a clear improvement over previous monitoring solutions used by MeerKAT and is an effective site management tool.
- ItemOpen AccessSimulations of carbohydrate conformational dynamics and thermodynamics(2003) Kuttel, Michelle Mary; Naidoo, Kevin JIn this thesis, free energy calculations are employed to establish the conformation freedom of selected carbohydrates. The focus is on the biologically important (1-4)-linked glucans, although the α(1-1) α and α(1-6) linkages are also investigated. Two principal types of potential of mean force (PMF) caltulation are used: free energy calculations for ratations about dihedral angles and end-to-end distance free energy calculations for the extension and compression of oligosaccharide strands.
- ItemOpen AccessTowards realistic interactive sand : a GPU-based framework(2009) Longmore, Juan-Pierre; Marais, Patrick; Kuttel, Michelle MaryMany real-time computer games contain virtual worlds built upon terrestrial landscapes, in particular, "sandy" terrains, such as deserts and beaches. These terrains often contain large quantities of granular material, including sand, soil, rubble, and gravel. Allowing other environmental elements, such as trees or bodies of water, as well as players, to interact naturally and realistically with sand, is an important milestone for achieving realism in games. In the past, game developers have resorted to approximating sand with flat. textured surfaces that are static, non-granular, and do not behave like the physical material they model. A reasonable expectation is that sand be granular in its composition and governed by the laws of physics in its behaviour. However, for a single PC user, physics-based models are too computationally expensive to simulate and animate in real-time. An alternative is to use computer clusters to handle numerically intensive simulation, but at the loss of single-user affordability and real-time interactivity. Instead, we propose a GPU-based simulation framework that exploits the massive computational parallelism of a modern GPU to achieve interactive frame rates, on a single PC. We base our method on a discrete elements approach that represents each sand granule as a rigid arrangement of particles. Our model shows highly dynamic phenomena, such as splashing and avalanching, as well as static dune formation. Moreover, by utilising standard metrics taken from granular material science, we show that the simulated sand behaves in accordance with previous numerical and experimental research. We also support general rigid bodies in the simulation by automated particle-based sampling of their surfaces. This allows sand to interact naturally with its environment without extensive modification to underlying physics engine. The generality of our physics framework also allows for real-time physically-based rigid body simulation sans sand, as demonstrated in our testing. Finally, we describe an accelerated real-time method for lighting sand that supports both self-shadowing and environmental shadowing effects.